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The reconstruction of the CMBR power spectrum from a map represents a major computational 
challenge to which much effort has been applied. However, once the power spectrum has been 
recovered there still remains the problem of extracting cosmological parameters from it. Doing 
this involves optimizing a complicated function in a many dimensional parameter space. Therefore 
efflcient algorithms are necessary in order to make this feasible. We have tested several different 
types of algorithms and found that the technique known as simulated annealing is very effective 
for this purpose, ft is shown that simulated annealing is able to extract the correct cosmological 
parameters from a set of simulated power spectra, but even with such fast optimization algorithms, 
a substantial computational effort is needed. 

PACS numbers: 98.70.Vc, 98.80.-k, 02.60.Pn 



I. INTRODUCTION 



In the past few years it has been realized that the Cos- 
mic Microvirave Background Radiation (CMBR) holds in- 
formation about virtually all relevant cosmological pa- 
rameters [P^. The shape and amplitude of the fluc- 
tuations in the CMBR are strongly dependent on such 
parameters as fl, Hq etc. [||. Given a sufficiently accu- 
rate map of fluctuations it should therefore in principle 
be possible to extract information on the values of these 
parameters. In general, it is customary to describe the 
fluctuations in spherical harmonics 
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where the aim coefficients are related to the power spec- 
trum by Ci — {alm'^irnjm- For purely Gaussian fluctu- 
ations the power spectrum contains all statistical infor- 
mation about the fluctuations 

The CMBR fluctuations were first detected in 1992 
by the COBE satellite g, and at present the COBE 
measurements together with a number of smaller scale 
experiments Q make up our experimental knowledge of 
the CMBR power spectrum. These data are not of suffl- 
cient accuracy to really pin down any of the cosmological 
parameters, but the next few years will hopefully see an 
explosion in the amount of experimental data. Two new 
satellite projects, the American MAP and the European 
PLANCK [|, are scheduled and are designed to measure 
the power spectrum precisely down to very small scales 
{I ~ 1000 for MAP and / ~ 2000 for PLANCK). This 
should yield sufficient information to determine almost 
all relevant cosmological parameters. 

However, using CMBR data to extract information 
about the underlying cosmological parameters will rely 
heavily on our ability to handle very large amounts of 



data (Refs. and references therein). The first prob- 

lem lies in constructing a power spectrum from the much 
larger CMBR map. If there are m data points, then the 
power spectrum calculation involves inversion of m x m 
matrices (an order rrv^ operation). For the new satellite 
experiments rm? is prohibitively large |Q-|l^, and much 
effort has been devoted to finding methods for reduc- 
ing this number by exploiting inherent symmetries in the 
CMBR [^,||. However, once the power spectrum has been 
constructed the troubles are not over. Then the space of 
cosmological parameters has to be searched for the best- 
fit model. If there are n free cosmological parameters, 
each sampled by q points, then the computational time 
scales as and, if n is large, the problem becomes in- 
tractable. In the present paper we assume that a power 
spectrum has been constructed, so that only the prob- 
lem of searching out the cosmological parameter space 
remains. 

In general, parameter extraction relies on the fact that 
for Gaussian errors it is possible to build a likelihood 
function from the set of measurements llSl 
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where = (fJ, fif,, i7o, n,T, . . .) is a vector describing the 
given point in parameter space, a; is a vector containing 
all the data points. This vector can represent either the 
CMBR map, or the reconstructed power spectrum points. 
C(0) is the data covariance matrix. 

Assuming that the data points are uncorrelated, so 
that the data covariance matrix is diagonal, this can be 
reduced to the simple expression, C cx ' , where 
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is a x^-statistics and A'max is the number of power spec- 
trum data points flir 
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In order to extract parameters from the power spec- 
trum we need to minimize over the multidimensional 
parameter space. 

In general there is no easy way of doing this. The 
topology of could be very complicated, with several 
different local minima. However, let us for now ignore 
this possible problem and assume that the function is 
unimodal. Then there exist a vast number of algorithms 
for extremizing the function. The most efficient methods 
for optimization usually depend on the ability to calcu- 
late the gradient of the objective function, . These 
methods work on completely general continuously differ- 
entiable functions, but under the right assumptions, 
possesses qualities which makes it possible to improve 
on the simple gradient methods. In general, the second 
derivative of with respect to parameters i and j is 
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Sufficiently close to the minimum of , the second term 
in the equation above should be small compared with 
the ffist. In practice this means that we get the second 
derivative information "for free" by just calculating the 
first derivative. Therefore, if we assume that the start- 
ing point for the optimization is sufficiently close to the 
true minimum, an algorithm utilising second-derivative 
information should converge much faster than a gradient 
method. The most popular algorithm of this type is the 
Levenberg-Marquardt method iQ. Note, however, that 
far away from the minimum, the above expression for 
the second derivative can be very wrong and cause the 
algorithm to converge much slower. 

Both gradient and second order algorithms are typ- 
ically very efficient. However, there are several weak- 
nesses: 1) They rely on our ability to calculate deriva- 
tives of . Although in principle this is no problem, 
numerical experiments have shown that results for this 
derivative are not always reliable ||^. For instance, the 
numerical code for calculating power spectra, CMBFAST 
[p^ , is fundamentally different for open and flat cosmolo- 
gies, and has no implementation of closed models, so that 
the derivative of with respect to fio is not reliable 
at fig = 1. This is just one example, but the problem 
is generic as soon as points are located sufficiently near 
parameter boundaries. 2) The next problem is related 
to the fact that the above methods in general works as 
steepest descent methods. This means that they are very 
easily fooled into taking the shortest path towards some 
local minimum which needs not be global. If there are 
either many local minima or the topology of x^ is com- 
plicated with many near degeneracies, then the above 
gradient-based methods are likely to perform poorly. Un- 
fortunately this might easily be the case with any given 
realization of the CMBR power spectrum. 



II. STOCHASTIC OPTIMIZATION 
A. Multistart algorithms 

The above caveats lead us to look for more robust 
methods for finding the true minimum of x^- As soon as 
we are dealing with multimodal functions it is clear that 
we cannot contend ourselves with just running an opti- 
mization scheme based on the above method with just 
one starting point. The simplest possible improvement 
on the above method is a Monte Carlo multi start algo- 
rithm. In this case a starting point is chosen at random 
in the parameter space, and optimization is performed, 
using either a gradient or a second-order method. After 
the algorithm converges a new starting point is chosen. 
This method has the advantage that it converges to the 
global minimum in the asymptotic limit of infinite com- 
putational time. However, it is easy to improve on it, 
because the simple multistart algorithm will detect the 
same local minimum many times uncritically. 

The multi level single linkage (MLSL) algorithm 
tries to alleviate this problem by mapping out the basins 
connected with the different local minima. If it detects 
that a trial point lies within a basin which has already 
been mapped, then the point is rejected. Depending on 
the type of objective function this algorithm can perform 
exceedingly well Q 

In what follows we use the simple implementation of 
the MLSL algorithm provided by Locatelli ||T^. First, 
we need the following deflnition: Let Xmax and Xmin be 
the maximum and minimum allowed value of parameter 
i. Then define a new parameter q= (x — a;min)/(a;max — 
a^min), so that q € [0, 1]. We use this new parameter q in 
the algorithm below, so that all cosmological parameters 
are treated on an equal footing and the allowed region 
is a simple hypercube spanning all values from to 1 in 
i?". The algorithm is then devised as follow: 

1) At each step, k, pick out N sample points from the 
allowed region and calculate the objective function. 

2) Sort the whole sample of kN points in order of in- 
creasing x^ value and select the ^kN points with small- 
est values. 

3) For all of these points, run optimization on given point 
9, iff 

- No point y exists so that 

d{q, y) <a and x^iv) < X^{q) 

- d{q,S) > d 

- Optimization was not previously applied to q. 

Optimization is performed with a gradient method. 

4) Proceed to step fc -I- 1. 

In the above, d{q, y) is the Euclidean distance between 
X and y, and S is the set of already discovered local min- 
ima, a and d are predefined distances which should be 
chosen to optimize the rate of finding local minima. They 
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are a measure of how large the basins connected with lo- 
cal minima are in general in that specific problem. The 
above method thus includes a host of different parame- 
ters which should be chosen by the user, N, d, 7 and a. 
This can make it quite troublesome to devise an algo- 
rithm which performs optimally. In our implementation 
we have chosen = 10, 7 = 0.2, d — 0.1 and a = 0.1. 
Note that this is somewhat in conflict with the definition 
given by Rcfs. ||l^,0, in that a should really be a quan- 
tity which depends on k, but in order to obtain a simple 
implementation we have used the above values. 



B. Simulated annealing 

A completely different method, which in the next sec- 
tion is shown to be very effective for minimization on 
CMBR power spectra, is simulated annealing. 

The method of simulated annealing was first intro- 
duced by Kirkpatrick et al. in 1983 [|l^,^. It is based 
the cooling behaviour of thermodynamic systems. Con- 
sider a thermodynamic system in contact with a heat 
bath at some temperature, T. If left for sufficiently long 
the system will approach thermal equilibrium with that 
temperature. The heat bath is then cooled, and if this is 
done slowly enough the system maintains equilibrium in 
the cooling phase, and finally as T — > settles into the 
true ground state, the state with the lowest possible en- 
ergy. This is very similar to global searches for minima of 
functions and simulated annealing relies on the fact that 
the function to be minimized can be considered as the 
energy of a thermodynamic system. If the system is then 
cooled from a very high "temperature" towards T = it 
should find the global minimum, given that it maintains 
thermal equilibrium at all times. 

In practise one lets the system jump around in pa- 
rameter space at random. Given a starting point i, 
a trial point is sought according to some prescription, 
and is then either accepted or rejected according to the 
Metropolis acceptance probability |^l| 

^accept(z + 1) - I g-(£.+ i-£;.)/T fo^ ^^^^ > , (5j 

where, in our case E ^ x^- There are very many similar- 
ities between this and thermodynamic systems, at high 
temperatures the system visits all states freely, while at 
low temperatures it can visit only states very close to 
the minimum. For instance it has been shown that by 
using the above criterion the system asymptotically ap- 
proaches the Boltzmann distribution, given that it is kept 
at constant temperature asymptotically long ||2^. Also, 
if a system undergoes simulated annealing with complete 
thermal equilibrium at all times then as T — > the en- 
ergy approaches the global minimum |22[ . For absolute 
global convergence to be ensured it is thus necessary to 
allow infinite time at each temperature. 



In order to use simulated annealing for functional op- 
timization it is necessary to specify three things: 

1) A space of all possible system configurations 

2) A cooling schedule for the system 

3) A neighbourhood structure. 

Here, the configuration space is a hypercubc in i?" 
bounded by the limits on the individual parameters. 

The cooling schedule and the neighbourhood structure 
are both something which in general are quite difficult 
to choose optimally Further, they make the scheme 
problem dependent. For this reason adaptive simulated 
annealing procedures have been devised which dynami- 
cally choose the cooling rate and neighbourhood directly 
from the previous iterations in order to maximize the 
thermalisation rate [p3| . The problem with this approach 
is that the thermodynamic behaviour is no longer well- 
defined. For instance the approach to a Boltzmann dis- 
tribution is not ensured. 

In the present work we choose a relatively simple cool- 
ing schedule and neighbourhood structure, neither of 
which are adaptive. In practise we start with an initial 
temperature, Tq, which is then lowered exponentially by 
the following criterion T^+i = aTi, where a is some con- 
stant. When the temperature reaches a final value Ti the 
algorithm stops. In this way a is a function of the total 
number of steps, Ns, given as a = (Ti/Tq)^/^'. 

The neighbourhood search is devised so that at high 
temperatures the system is prone to make large jumps 
whereas at lower temperatures it mostly searches the 
nearest-neighbour points. In our specific model the pa- 
rameter space consists of a vector, x, of n free parame- 
ters, bounded from below by the vector, TCmin, and from 
above by Xmax- Let iteration point i have the value 
for the parameter labelled /3. Then the value of this pa- 
rameter at iteration z-l-1 has acceptance probability given 
as 

P[(x0),+i] cx e-l(-'')'+^-(-'')•l/^-^ (6) 

where 

n^fi = A^iixpU.^ - (x/3)„,in](T/ro)l/2, (7) 

and A/} is some constant, chosen to yield a good conver- 
gence rate. The above probability is set to if (a;/3)i+i 
is outside the allowed interval for the given parameter. 
This criterion for picking out trial points has the desired 
quality that it makes large jumps at high temperature 
and progressively smaller jumps as the temperature is 
lowered. If the objective function depends strongly on /3, 
then Ap should be small, whereas if it is almost indepen- 
dent of f3, Ap should be large. It is well known that x^ is 
almost degenerate in the parameter Qmh^ (2). Therefore 
it is natural to choose Aq^i^2 to be small. In our im- 
plementation we have chosen the following values for the 
control parameters: Tq — 10'', Ti = 2, Aq^^^ = 1/32, 
Ap = l/8for P^n^h^. 
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Note that the method of simulated anneahng was first 
apphed to simulated CMBR data by Knox [g^, for a 
relatively small model with four free parameters. 



III. NUMERICAL RESULTS 

A. Performance of different algorithms 

In order to test the relative efficiency of the different 
optimization schemes we have tried to run minimiza- 
tion on synthetic power spectra. All the power spec- 
tra in the present paper have been calculated by use 
of the publicly available CMBFAST package To 
make calculations not too cumbersome we have restricted 
the calculations to a six-dimensional parameter space, 
characterised by the vector O — (fJ,,,,, ilb, HQ,ns, N,^, Q). 
The model is taken to have flat geometry so that f^A = 
1 — i7„i. We start from an assumed true model with 
9 = (0.5, 0.05, 50, 1, 3, 30 ^K), i.e. fairly close to the cur- 
rently favoured ACDM model |^ . Table I shows the free 
parameters, as well as the allowed region for each. We 
further assume that all CiS up to / = 1000 can be mea- 
sured without noise. That is, the errors are completely 
dominated by cosmic variance, with the error being equal 
to 0,1 
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From underlying statistics we have produced a single re- 
alisation which we take to be the measured power spec- 
trum. 

Since we have iV = 999 synthetic data points, all nor- 
mally distributed, of the data set, relative to the true, 
underlying power spectrum should have a distribution 
with mean TV, and standard error ^/2N , so that 



X 



999 ± 45. 



(9) 



The specific synthetic data set we use has xi = 1090.98, 
i.e., it is within about 2cr of the expected value. If the op- 
timization routine is optimal, then for each optimization 



The average of several optimization runs should prefer- 
ably yield a value which is somewhat below x* ■ We there- 
fore have a measure of whether or not the optimization 
has been successful. 

We have tested four different optimization algorithms 
on a subset of the full six-dimensional parameter space. 
The algorithms are: Simple Monte Carlo multistart with: 
1) gradient optimization method (G), 2) Levenberg- 
Marquardt method (LM), 3) multi level single linkage 
(MLSL), as described in Section Ila, 4) simulated an- 
nealing, as described in Section lib. Algorithms 1-3 use 
optimization routines from the P0RT3 library p6|. 
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TABLE I. The free parameters used in the present analysis, 
as well as the allowed range for each. 



Parameter 



Allowed range 



Q 

h 
n 



5-40 AiK 
0.018-0.49 
0.002-0.030 
0.30-0.75 
0.7-1.3 
1-5 



FIG. 1. The average x found by the different algorithms. 
The data points are for 1) simple gradient method (trian- 
gles), 2) Levenberg-Marquardt method (squares), 3) multi 
level single linkage (crosses), and simulated annealing (dia- 
monds). The top panel shows optimization for the case of 
four free parameters, whereas the bottom panel shows it for 
five parameters. 

In order to make direct comparison between the algo- 
rithms, we have let them run for a fixed number of steps, 
where one step is defined equal to one power spectrum 
calculation. All methods, except simulated annealing, 
use gradient information, which means that additional 
power spectra must be calculated at each iteration. We 
use two sided derivatives, so that to calculate the gradi- 
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ent (and Hessian), we need 2n more calculations, where 
n is the number of cosmological parameters. Fig. 1 shows 
the minimum found by the different algorithms. Each 
point in Fig. 1 stems from a Monte Carlo run of 15 opti- 
mizations. 

Clearly, the MLSL method improves on the simple 
multi start algorithm. The LM algorithm performs bet- 
ter than gradient optimization in some cases, but in other 
cases it is much worse. This is probably due to the 
fact that if the starting point is far away from a local 
minimum then the second derivative may yield false in- 
formation because Eq. does not hold, causing the 
algorithm to converge slower. This weakness could be 
remedied to some extent by diagonalising the matrix 
of second-derivatives (Fisher matrix diagonalisation), so 
that the correlation between different parameters is ap- 
proximately broken. 

However, the most striking feature in Fig. 1 is that 
SA outperforms the other algorithms easily. Most likely 
this is due to the fact that possesses valleys where 
the function has many almost degenerate local minima. 
Note that the likelyhood function does not need to be 
truly multi-modal for this effect to occur. It can happen 
either because the parameter space is constrained so that 
the algorithm takes a path which leads out of the allowed 
space, or because there are small "bumps" on close to 
the global minimum, which cause the gradient algorithms 
to get trapped, is not multimodal in the sense that 
it contains equally good local minima, separated by long 
distances in parameter space. 




200 400 600 800 1000 
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FIG. 2. Four different runs of the simple gradient method 
(G), without multi-start. 

For the case of four free parameters (upper panel), 
most of the algorithms produce acceptable results with 
about 1000 steps, but with five parameters (lower panel), 
about 2000 steps are needed. In both cases, SA needs 
substantially fewer steps than the other algorithms. 



In Fig. 2 we show four different runs of the simple 
gradient-based algorithm without multi-start. In two of 
the cases the algorithm converges towards the global min- 
imum, whereas in the two other it becomes trapped at 
much higher lying points in parameter space. We have 
tested the effect of varying step size in the gradient cal- 
culation and found that the results do not depend on 
this. This figure also shows that the gradient based algo- 
rithms generally converge fairly rapidly (i.e. a few hun- 
dred steps), so that the multi-start algorithm generally 
runs several times even for relatively a relatively small 
number of total steps. 

B. Parameter extraction 

If the minimization succeeds in finding the global 
minimum, then the value found should reflect the un- 
derlying measurement uncertainty. We have performed a 
detailed Monte Carlo study of how well the SA algorithm 
is able to extract parameters from the power spectrum. 
The test goes as follows: First, construct A^mc synthetic 
measured power spectra, as described in the previous sec- 
tion. Then run optimization on each one of these spectra. 
This produces A'mc estimated points in parameter space. 
In order to compare these points with the underlying un- 
certainty, we then need to calculate the estimated stan- 
dard error on the different parameters. This is done by 
the standard method of calculating the Fisher informa- 
tion matrix. At the true point in parameter space, the 
likelihood function should be maximal, so that it should 
have zero gradient. The matrix of second derivatives is 
then given by (Eq. (||)) 

(11) 

The expected error on the estimation of parameter i is 
then given by 

^? ^ {I-'h, (12) 

if we assume that all the relevant cosmological parame- 
ters should be determined simultaneously. The expected 
error on ilm is cr = 0.098, given our assumed measure- 
ment precision. Note that above we have again assumed 
that the only uncertainty in the measurements is from 
cosmic variance. 

We have performed this Monte Carlo test on the 6- 
dimensional parameter space, using 24 different synthetic 
spectra. We have extracted parameters using SA with 
a different number of total steps: 500, 2000 and 4000. 
Fig. 3 shows how the estimated points are distributed 
for the parameter Qrn- We have binned the extracted 
points in bins of width la up to 5a. For the optimiza- 
tion performed with 500 steps the distribution is very 
wide, showing no speciflc centering on the true param- 
eter value. The optimization with 2000 steps extracts 
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values which are centered on the true value, indicative of 
a good optimization. Furthermore, the optimization with 
4000 steps shows little improvement over that with 2000, 
again indicating that the one with 2000 steps is already 
performing optimally. Note that both for 2000 and 4000 
steps the distribution of extracted points is significantly 
wider than the theoretical expectation which was calcu- 
lated assuming a normal distribution with o = 0.098. 
One would expect this to be the case since the proba- 
bility distribution of any given parameter is only normal 
close to the true value, even for a perfect optimization. 
Therefore there are likely to be more outlying points than 
suggested by the normal distribution. 



20 r 




FIG. 3. Values of f2m found by the optimization pro- 
cedure for the 24 Monte Carlo runs. The recovered val- 
ues are shown in bins of width Icr, where a — 0.098 and 
A9 = |r2m, found — 0.5j. The full line is the theoretical expec- 
tation, assuming that errors on i},n are normally distributed. 



normal distribution. We can also calculate for the 
sample 

A^MC ^ 

X9= 51 ^(^t°""d-etruc)'. (14) 

1=1 

This function should be approximately distributed. 
We have calculated and for the sample of extracted 
parameters, to see if it is compatible with the theoretical 
expectations. Table II shows the values found from the 
24 Monte Carlo simulations. The sample mean found by 
the optimization with 500 steps deviates by more than 
7a from the expectation. Again this indicates a poor op- 
timization. The optimizations with 2000 and 4000 steps 
succeed in recovering the true mean to within 2a. As for 
X^, it is much lower for the 2000 and 4000 steps opti- 
mizations than for the 500 steps. However, both are still 
much larger than expected from a normal distribution. 
As mentioned above this has to do with the fact that 
the distribution is not normal far away from the true pa- 
rameter value, so that more outlier points are expected. 
These contribute heavily to x^j so that a larger value can 
be expected, even for a perfect optimization. 

As seen above, even for the small 6 parameter model 
we use, it is necessary on average to calculate more than 
10^ power spectra. Even on a fast computer this is some- 
thing which takes several hours. This must be done each 
time one wants to check how a new proposed cosmological 
model fits the data. This very clearly shows the neces- 
sity of using fast optimization algorithms for parameter 
extraction. 

Note that the models we have calculated are flat and 
without reionization, including either curvature or reion- 
ization significantly slows the CMBFAST code. Also, 
more exotic models like scenarios with decaying neutrinos 
lead to very cumbersome CMBR spectrum calculations 




The above Monte Carlo method was also used by Knox 
in order to test the optimization efficiency for a 
small model with 4 free parameters. 



TABLE II. Recovered mean value and for the 50 Monte 
Carlo runs performed, for the parameter fim . Values in paren- 
theses are the expected theoretical values. 





Steps 


/^sample 


x' 




500 


0.655 (0.50 ± 0.020) 


198.4 (50 ± 10) 




2000 


0.499 (0.50 ± 0.020) 


73.7 (24 ± 6.9) 




4000 


0.474 (0.50 ± 0.020) 


67.7 (24 ± 6.9) 



If we have A^mc Monte Carlo runs, then if the opti- 
mization is perfect one should obtain a sample mean of 
roughly 

Msainplc — Mtruc ^ ^si (-^*^) 

where as — a j^/Nyic for a given parameter if A^mc is 
large and the extracted parameters are drawn from a 
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IV. DISCUSSION AND CONCLUSIONS 

We have tested different methods for ^ minimization 
and parameter extraction on CMBR power spectra. It 
was found that simulated anneahng is very effective in 
this regard, and that it compared very favourably with 
other optimization routines. The reason for this is most 
likely that posseses very nearly degenerate minima. 
Also, numerical noise in the CMBFAST code can cause 
the gradient information to become unreliable near sta- 
tionary points, causing the gradient based methods to 
become trapped in points which are not true minima. 

We have also found that even for the simulated an- 
nealing algorithm, many power spectrum calculations are 
usually necessary in order to obtain a good estimate of 
the global minimum. Without a fast optimization al- 
gorithm it is very difficult to extract reliable parameter 
estimates from CMBR power spectra, and even with a 
routine like SA, it is computationally very demanding as 
soon as the parameter space is realistically large (9-10 
dimensional) . 

Note that all of the above calculations rely on stochas- 
tic methods in that they start out at completely random 
points in the allowed parameter space. This is very differ- 
ent from the method used by Oh, Spergel and Hinshaw 
[^, who use as the initial point a fit obtained by the 
chi-by-eye method and then optimize that initial guess 
using a second order method. This method surely makes 
the optimization algorithm converge faster, but suffers 
greatly from the problem of how to choose the initial 
point without biasing the outcome (i.e. making the algo- 
rithm find a minimum which is not global). We believe 
that using stochastic optimization is a much more robust 
way of optimization. 

Interestingly, there are other modern algorithms for 
optimization which work along some of the same princi- 
ples as SA, for instance genetic algorithms |2^. Given 
the magnitude of the computational challenge provided 
by upcoming CMBR data, it appears worthwhile to ex- 
plore the potential of such new algorithms. 
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